library(dplyr)
library(sf)
library(ggplot2)
library(osmdata)
library(leaflet)
library(nominatimlite)
Open Street Map (OSM) is a collaborative project which aims at mapping the world and sharing geospatial data in an open way. Anyone can contribute, by mapping geographical objects their encounter, by adding topical information on existing map objects (their name, function, capacity, etc.), or by mapping buildings and roads from satellite imagery (cf. HOT: Humanitarian OpenStreetMap Team).
This information is then validated by other users and eventually added to the common “map” or information system. This ensures that the information is accessible, open, verified, accurate and up-to-date.
The result looks like this: The geospatial data underlying this
interface is made of geometrical objects (i.e. points, lines, polygons)
and their associated tags (#building #height, #road #secondary #90kph,
etc.).
The first thing to do is to define the area within which you want to
retrieve data, aka the bounding box. This can be defined easily
using a place name and the package nominatimlite to access
the free Nominatim API provided by OpenStreetMap.
We are going to look at Brielle together, but you can also work with the small cities of Naarden, Geertruidenberg, Gorinchem, Enkhuizen or Dokkum.
/! This might not be needed, but ensures that your machine knows it has internet…
assign("has_internet_via_proxy", TRUE, environment(curl::has_internet))
nominatim_polygon <- nominatimlite::geo_lite_sf(address = "Brielle", points_only = FALSE)
bb <- sf::st_bbox(nominatim_polygon)
bb
xmin ymin xmax ymax
4.135294 51.883500 4.229222 51.931410
There might multiple responses from the API query, corresponding to different objects at the same location, or different objects at different locations. For example: Brielle (Netherlands) and Brielle (New Jersey)
We should therefore try to be as unambiguous as possible by adding a country code or district name.
nominatim_polygon <- nominatimlite::geo_lite_sf(address = "Brielle, NL", points_only = FALSE)
bb <- sf::st_bbox(nominatim_polygon)
bb
xmin ymin xmax ymax
4.135294 51.883500 4.229222 51.931410
A feature in
the OSM language is a category or tag of a geospatial object. Features
are described by general keys (e.g. “building”, “boundary”, “landuse”,
“highway”), themselves decomposed into sub-categories (values) such as
“farm”, “hotel” or “house” for buildings, “motorway”,
“secondary” and “residential” for highway. This determines
how they are represented on the map.
Let’s say we want to download data from OpenStreetMap and we know
there is a package for it named osmdata, but we don’t know
which function to use and what arguments are needed. Where should we
start?
Let’s check the documentation online:
The OSMdata Documentation page
It appears that there is a function to extract features, using the
Overpass API. This function’s name is opq (for
OverPassQuery) which, in combination with add_osm_feature,
seems to do the job. However it might not be crystal clear how to apply
it to our case. Let’s click on the function name to know more.
The Overpass Query Documentation page
On this page we can read about the arguments needed for each
function: a bounding box for opq() and some
key and value for
add_osm_feature(). Thanks to the examples provided, we can
assume that these keys and values correspond to different levels of tags
from the OSM classification. In our case, we will keep it at the first
level of classification, with “buildings” as key, and no
value. We also see from the examples that another function is needed
when working with the sf package:
osmdata_sf(). This ensures that the type of object is
suited for sf. With these tips and examples, we can write
our feature extraction function as follows:
x <- opq(bbox = bb) %>%
add_osm_feature(key = 'building') %>%
osmdata_sf()
What is this x object made of? It is a table of all the buildings contained in the bounding box, which gives us their OSM id, their geometry and a range of attributes, such as their name, building material, building date, etc. The completion level of this table depends on user contributions and open resources (here for instance: BAG, different in other countries).
head(x$osm_polygons)
For instance: the building age focusing on post-1900 buildings.
First, we are going to select the polygons and reproject them with the Amersfoort/RD New projection, suited for maps centred on the Netherlands. This code for this projection is: 28992.
buildings <- x$osm_polygons %>%
st_transform(.,crs=28992)
Then we create a variable which a threshold at 1900. Every date prior to 1900 will be recoded 1900, so that buildings older than 1900 will be represented with the same shade.
Then we use the ggplot function to visualise the
buildings by age. The specific function to represent information as a
map is geom_sf(). The rest works like other graphs and
visualisation, with aes() for the aesthetics.
buildings$build_date <- as.numeric(
if_else(
as.numeric(buildings$start_date) < 1900,
1900,
as.numeric(buildings$start_date)
)
)
ggplot(data = buildings) +
geom_sf(aes(fill = build_date, colour=build_date)) +
scale_fill_viridis_c(option = "viridis")+
scale_colour_viridis_c(option = "viridis")
So this reveals the historical centre of [city] and the various extensions. Anything odd? what? around the centre? Why these limits / isolated points?
Leaflet (20 min)addPolygons function.fillColor of these polygons represent the
build_date variable. Choropleth
documentation. Using the example and replace the variable names
where needed.sf packagesf?sf is a package which supports simple features (sf), “a standardized
way to encode spatial vector data.”. It contains a large set of
functions to achieve all the operations on vector spatial data for which
you might use traditional GIS software: change the coordinate system,
join layers, intersect or unit polygons, create buffers and centroids,
etc. cf. the sf cheatsheet.
Let’s focus on old buildings and imagine we’re in charge of their conservation. We want to know how much of the city would be affected by a non-construction zone of 100m around pre-1800 buildings.
Let’s select them and see where they are.
old <- 1800 # year prior to which you consider a building old
summary(buildings$start_date)
Length Class Mode
10608 character character
buildings$start_date <- as.numeric(as.character(buildings$start_date))
old_buildings <- buildings %>%
filter(start_date <= old)
ggplot(data = old_buildings) + geom_sf(colour="red")
As conservationists, we want to create a zone around historical
buildings where building regulation will have special restrictions to
preserve historical buildings.
Let’s say this zone should be 100 meters. In GIS terms, we want to
create a buffer around polygons. The corresponding function
sf is st_buffer, with 2 arguments: the
polygons around which to create buffers, and the radius of the
buffer.
distance <- 100 # in meters
buffer_old_buildings <-
st_buffer(x = old_buildings, dist = distance)
ggplot(data = buffer_old_buildings) + geom_sf()
Now, we have a lot of overlapping buffers. We would rather create a
unique conservation zone rather than overlapping ones in that case. So
we have to fuse the overlapping buffers into one polygon. This operation
is called union and the corresponding function is
st_union.
single_old_buffer <- st_union(buffer_old_buildings) %>%
st_cast(to = "POLYGON") %>%
st_as_sf()
single_old_buffer<- single_old_buffer %>%
mutate("ID"=as.factor(1:nrow(single_old_buffer))) %>%
st_transform(.,crs=28992)
We also use st_cast() to explicit the type of the
resulting object (POLYGON instead of the default
MULTIPOLYGON), st_as_sf() to transform the polygon
into an sf object.
For the sake of visualisation speed, we would like to represent
buildings by a single point (for instance: their geometric centre)
rather than their actual footprint. This operation means defining their
centroid and the corresponding function is st_centroid.
sf::sf_use_s2(FALSE)
centroids_old <- st_centroid(old_buildings) %>%
st_transform(.,crs=28992)
ggplot() +
geom_sf(data = single_old_buffer, aes(fill=ID)) +
geom_sf(data = centroids_old)
Now what we would like to distinguish conservation areas based on the
number of historic buildings they contain. In GIS terms, we would like
to know how many centroids each fused buffer polygon contains. This
operation means intersecting the layer of polygons with the layer of
points and the corresponding function is
st_intersection.
centroids_buffers <-
st_intersection(centroids_old,single_old_buffer) %>%
mutate(n=1)
centroid_by_buffer <- centroids_buffers %>%
group_by(ID) %>%
summarise(n = sum(n))
single_buffer <- single_old_buffer %>%
mutate(n_buildings = centroid_by_buffer$n)
ggplot() +
geom_sf(data = single_buffer, aes(fill=n_buildings)) +
scale_fill_viridis_c(alpha = 0.8,
begin = 0.6,
end = 1,
direction = -1,
option = "B")
Let’s map this layer over the initial map of individual buildings.
ggplot() +
geom_sf(data = buildings) +
geom_sf(data = single_buffer, aes(fill=n_buildings), colour = NA) +
scale_fill_viridis_c(alpha = 0.6,
begin = 0.6,
end = 1,
direction = -1,
option = "B")
old <- 1939
distance <- 10
#select
old_buildings <- buildings %>%
filter(start_date <= old)
#buffer
buffer_old_buildings <- st_buffer(old_buildings, dist = distance)
#union
single_old_buffer <- st_union(buffer_old_buildings) %>%
st_cast(to = "POLYGON") %>%
st_as_sf()
single_old_buffer <- single_old_buffer %>%
mutate("ID"=1:nrow(single_old_buffer)) %>%
st_transform(single_old_buffer,crs=4326)
#centroids
centroids_old <- st_centroid(old_buildings) %>%
st_transform(.,crs=4326)
#intersection
centroids_buffers <- st_intersection(centroids_old,single_old_buffer) %>%
mutate(n=1)
centroid_by_buffer <- centroids_buffers %>%
group_by(ID) %>%
summarise(
n = sum(n)
)
single_buffer <- single_old_buffer %>%
mutate(n_buildings = centroid_by_buffer$n)
ggplot() +
geom_sf(data = buildings) +
geom_sf(data = single_buffer, aes(fill=n_buildings), colour = NA) +
scale_fill_viridis_c(alpha = 0.6,
begin = 0.6,
end = 1,
direction = -1,
option = "B")
Problem: there are many pre-war buildings and the buffers are large so the number of old buildings is not very meaningful. Let’s compute the density of old buildings per buffer zone.
single_buffer$area <- st_area(single_buffer, ) %>% units::set_units(., km^2)
single_buffer$old_buildings_per_km2 <- as.numeric(single_buffer$n_buildings / single_buffer$area)
ggplot() +
geom_sf(data = buildings) +
geom_sf(data = single_buffer, aes(fill=old_buildings_per_km2), colour = NA) +
scale_fill_viridis_c(alpha = 0.6,
begin = 0.6,
end = 1,
direction = -1,
option = "B")
We have seen how OpenStreetMap (OSM) geodata works and how to import, select, and visualise OSM vector data. We have seen how to create spatial buffers and centroids, how to intersect vector data and how retrieve the area of polygons.
In short:
osmdata, ggplot and
sf packagesst_* functions for basic GIS operations